vectorscale: pack-positions pre-pass + geometric crossing intersection#909
Draft
northbymidwest wants to merge 1 commit intolibretro:masterfrom
Draft
vectorscale: pack-positions pre-pass + geometric crossing intersection#909northbymidwest wants to merge 1 commit intolibretro:masterfrom
northbymidwest wants to merge 1 commit intolibretro:masterfrom
Conversation
Adds a per-CP pre-pass (pack-positions) that denormalizes render geometry into a single PackedPositions texture and folds the crossing curve-curve intersection into the same pass. The rasterizer reads its full per-CP geometry from PackedPositions and skips ghost extension, neighbor-index decoding, and t_branch solving in its hot loop. New shader: pack-positions.slang For each CP slot, packs into 3 horizontally-adjacent texels: col 0 = (pp.x, pp.y, prev_ci_or_-1, _) col 1 = (cp.x, cp.y, t_branch, validity 0=skip 1=normal 2=line) col 2 = (np.x, np.y, next_ci_or_-1, _) (pp, cp, np) is the ghost-extended (pp = 2·prev - cp etc.) Bezier control triple. t_branch is computed per CP type: - IS_CROSSING: 2D Newton iteration on F(t,s) = B_a(t) - B_b(s) = 0, starting from (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; 4 iterations drive the residual below f32 epsilon. Reads neighbor positions from both this slot's chain (N-S or E-W) and the partner slot's chain. This replaces the legacy ghost-aware inverse-correction that moved each crossing CP so the rendered curve passed through the grid corner at t=0.5. The CP now stays at its optimizer-final position and the rasterizer's wedge AA anchors at the geometric intersection B_a(t) = B_b(s). - 2-CP chain (degenerate stem with both ends as endpoint markers): t_branch = 0.5; render geometry pre-built as a straight line so the rasterizer dispatches to its closed-form line solver via is_line. - One-sided clamped Bezier (prev or next is endpoint): closed-form cubic project of the interior B-spline midpoint onto the clamped span — finds the t at which the rendered clamped curve reaches the same physical "before/after sc" boundary an interior B-spline would at t=0.5. - Else: t_branch = 0.5. Modified: update-tjunction.slang Drop the IS_CROSSING ghost-aware inverse-correction branch; crossings pass through unchanged. Drops the now-unused Opt2 sampler binding, read_orig_pos helper, and Opt2Size UBO field. Modified: cell-rasterizer.slang Replace read_pos + read_neighbors + ghost extension + 2-CP-chain construction + t_branch cubic-solver in test_one_cp with a single read_packed_cp(ci) call returning a PackedCp struct. Per-active-probe fetch count: ~6 → 4 (1 flag + 3 packed reads). resolve_hit's neighbor-direction lookups for color resolution are unchanged. Modified: vectorscale.slangp 11 passes (was 10). pack-positions inserted between the final update-tjunction iteration (FinalPositions) and cell-rasterizer. PackedPositions framebuffer is 3.0 × source-relative wide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR has two parts: a per-fragment performance optimization (the
pack-positionspre-pass) and a correctness improvement for crossings (geometric curve-curve intersection). Both touch only the existing pipeline — no changes to similarity-graph, resolve-crossings, cell-graph, init-positions, or optimize-energy.Performance —
pack-positionspre-passNew shader inserted between the final
update-tjunctioniteration andcell-rasterizer. For each CP slot, it packs the full per-CP render geometry into 3 horizontally-adjacent texels of aPackedPositionsframebuffer:(pp, cp, np)is already ghost-extended (pp = 2·prev − cpetc.) for endpoint neighbors.t_branchis computed in the right way per CP type (see correctness section below for crossings; closed-form cubic project for one-sided clamped Beziers; 0.5 otherwise).What this lets the rasterizer's
test_one_cpskip per pixel:Cost added: 3 fetches from PackedPositions per CP probe instead of 1 from FinalPositions. Net per active probe: ~6 fetches → 4 (1 flag + 3 packed reads). The pack-positions pass itself runs once-per-frame, O(num_cps) work — 3 fragments per CP slot.
Measured ~5–10% frame-time reduction on dense frames (sprites with many active CPs / large viewport scale where the rasterizer is the dominant cost). Sparser frames see less; fewer active CPs means fewer per-pixel probes get past the flag check and the existing 3-sample distance² quick screen, so the rasterizer spends proportionally less of its time in the body that pack-positions actually shortens.
resolve_hit's neighbor flag/dir lookups for color resolution are unchanged — neighbor indices still flow through to it via the packed texels' B channels.Correctness — respect the optimizer's crossing positions
At a 4-way crossing the rasterizer's wedge-AA junction lines need to anchor at the geometric meeting point of the N-S and E-W B-spline curves. The previous approach didn't compute that meeting point directly:
update-tjunctionran a ghost-aware inverse B-spline correction that relocated each crossing CP to the position that would make the rendered curve pass through the grid corner at exactlyt=0.5. That overrode the position the optimizer had chosen — the CP got pulled away from its energy-minimum back toward the integer grid corner so the rasterizer's downstreamt_branch=0.5assumption worked out.This PR keeps the optimizer's crossing position and computes the actual curve-curve intersection inline in pack-positions:
F(t, s) = B_a(t) − B_b(s) = 0by 2D Newton iteration starting from(t, s) = (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; quadratic convergence gets the residual below f32 epsilon in 3 iterations (4 used for safety). Each step inverts the 2×2 Jacobian analytically; an early break on|det(J)| < 1e-12handles the tangent / parallel-curves degenerate case (doesn't fire in practice).t_a(parameter on the N-S curve), slot 1 getst_b(E-W). The CP itself stays at the optimizer's final position — no relocation.t_branchstraight from PackedPositions and uses it as the wedge-AA junction parameter, soJ = beval(curve, t_branch)lands on the geometric intersection at whatevertthe two curves actually cross.update-tjunction.slangloses the IS_CROSSING branch entirely; only T-junction stem snap remains. The Opt2 sampler / read_orig_pos helper / Opt2Size UBO field are gone (no longer read).Pipeline diff
11 passes (was 10):
vectorscale.slangpupdated accordingly.Verification
glslangValidator.Co-authored-with @anthropic-ai/claude-code.